24 research outputs found

    An Optimal k Nearest Neighbours Ensemble for Classification Based on Extended Neighbourhood Rule with Features subspace

    Full text link
    To minimize the effect of outliers, kNN ensembles identify a set of closest observations to a new sample point to estimate its unknown class by using majority voting in the labels of the training instances in the neighbourhood. Ordinary kNN based procedures determine k closest training observations in the neighbourhood region (enclosed by a sphere) by using a distance formula. The k nearest neighbours procedure may not work in a situation where sample points in the test data follow the pattern of the nearest observations that lie on a certain path not contained in the given sphere of nearest neighbours. Furthermore, these methods combine hundreds of base kNN learners and many of them might have high classification errors thereby resulting in poor ensembles. To overcome these problems, an optimal extended neighbourhood rule based ensemble is proposed where the neighbours are determined in k steps. It starts from the first nearest sample point to the unseen observation. The second nearest data point is identified that is closest to the previously selected data point. This process is continued until the required number of the k observations are obtained. Each base model in the ensemble is constructed on a bootstrap sample in conjunction with a random subset of features. After building a sufficiently large number of base models, the optimal models are then selected based on their performance on out-of-bag (OOB) data.Comment: 12 page

    Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for TumorC dataset.

    No full text
    Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for TumorC dataset.</p

    Brief description of the datasets along with the corresponding number of features, observations, class-wise distributions and sources.

    No full text
    Brief description of the datasets along with the corresponding number of features, observations, class-wise distributions and sources.</p

    Classification error rates produced by different methods on simulated data.

    No full text
    Classification error rates produced by different methods on simulated data.</p

    Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for Colon dataset.

    No full text
    Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for Colon dataset.</p

    Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for Breastcancer dataset.

    No full text
    Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for Breastcancer dataset.</p

    Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for DLBCL dataset.

    No full text
    Box-plots of the error rates produced by random forest, using top 10 features selected by different feature selection methods for DLBCL dataset.</p

    Classification error rates produced by different methods on various subsets of genes.

    No full text
    Classification error rates produced by different methods on various subsets of genes.</p

    Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for Lungcancer dataset.

    No full text
    Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for Lungcancer dataset.</p

    Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for Prostate dataset.

    No full text
    Bar-plots of error rates of the proposed and the other classical methods on various subsets of genes for Prostate dataset.</p
    corecore